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The key enzyme in coronavirus polyprotein processing 
is the viral main proteinase, M p ™, a protein with 
extremely low sequence similarity to other viral and 
cellular proteinases. Here, the crystal structure of the 
33.1 kDa transmissible gastroenteritis (corona)virus 
MP r ° is reported. The structure was refined to 1.96 A 
resolution and revealed three dimers in the asym¬ 
metric unit. The mutual arrangement of the proto- 
mers in each of the dimers suggests that M p ™ 
self-processing occurs in tram. The active site, com¬ 
prised of Cysl44 and His41, is part of a chymotrypsin- 
like fold that is connected by a 16 residue loop to an 
extra domain featuring a novel a-helical fold. 
Molecular modelling and mutagenesis data implicate 
the loop in substrate binding and elucidate SI and S2 
subsites suitable to accommodate the side chains of 
the PI glutamine and P2 leucine residues of Mp™ sub¬ 
strates. Interactions involving the N-terminus and the 
a-helical domain stabilize the loop in the orientation 
required for trans-cleavage activity. The study illus¬ 
trates that RNA viruses have evolved unprecedented 
variations of the classical chymotrypsin fold. 

Keywords : 3C-like/catalytic dyad/coronavirus/proteinase/ 
X-ray crystallography 


Introduction 

Transmissible gastroenteritis virus (TGEV) belongs to the 
Coronaviridae, a family of positive-strand RNA viruses. 
Coronaviruses have the largest RNA viral genomes known 
to date (28 500 nucleotides in the case of TGEV) and share 
a similar genome organization and common transcrip¬ 
tional and translational strategies with the Arteriviridae 
(den Boon et al., 1991; Cavanagh, 1997). TGEV infection 
is associated with severe and often fatal diarrhoea in young 
pigs (for reviews see Enjuanes and van der Zeijst, 1995; 
Saif and Wesley, 1999). 

The viral proteins required for TGEV genome replic¬ 
ation and transcription are encoded by the replicase gene 
(Eleouet et al., 1995; Penzes et al., 2001). This gene 
encodes two replicative polyproteins, ppla (447 kDa) and 
pplab (754 kDa) that are processed by virus-encoded 
proteinases to produce the functional subunits of the 
replication complex (reviewed in Ziebuhr et al., 2000). 


The central and C-proximal regions of ppla and pplab are 
processed by a 33.1 kDa viral cysteine proteinase which is 
called the ‘main proteinase’ (M pro ) or, alternatively, the 
‘3C-like proteinase’ (3CL pro ). The name ‘3C-like protein¬ 
ase’ was introduced originally because of similar substrate 
specificities of the coronavirus M pro and picornavirus 3C 
proteinases (3C pro ) and the identification of cysteine as the 
principal catalytic residue in the context of a predicted 
two-p-barrel fold (Gorbalenya et al., 1989a,b). Meanwhile 
however, several studies have revealed significant differ¬ 
ences in both the active sites and domain structures 
between the coronavirus and picornavirus enzymes (Liu 
and Brown, 1995; Lu and Denison, 1997; Ziebuhr et al., 
1997, 2000; Hegyi et al., 2002). Also, the crystal structures 
reported for a number of picornavirus 3C proteinases 
(Allaire et al., 1994; Matthews et al., 1994; Bergmann 
et al., 1997; Mosimann et al., 1997) have not been useful 
in predicting the three-dimensional structures of corona¬ 
virus main proteinases. Because of the large phylogenetic 
distance between the two groups of enzymes, we will use 
the term coronavirus M pro throughout this article. 

Sequence comparisons (Figure 1) and experimental data 
obtained for other coronavirus homologues allow us to 
predict that the mature form of the TGEV M pro is released 
from ppla and pplab by autoproteolytic cleavage at 
flanking Glnl(Ser.Ala) sites (Eleouet et al., 1995; Hegyi 
and Ziebuhr, 2002). Accordingly, the TGEV M pro has 302 
amino acid residues that correspond to the ppla/pplab 
residues 2879-3180. In vivo and in vitro analyses of avian 
infectious bronchitis virus (IBV), mouse hepatitis virus 
(MHV) and human coronavirus 229E (HCoV 229E) M pro 
activities have shown consistently that the proteinase 
cleaves the replicase polyproteins at 11 conserved sites 
and, therefore, it seems reasonable to conclude that the 
M pro -mediated processing pathways are conserved in all 
coronaviruses, including TGEV. 

Previous theoretical studies and experimental data have 
led to the following conclusions (Bazan and Fletterick, 
1988; Gorbalenya et al.. 1989a,b; Liu and Brown, 1995; 
Lu et al., 1995; Ziebuhr et al., 1995, 1997, 2000; Lu and 
Denison, 1997; Seybert et al., 1997; Ziebuhr and Siddell, 
1999; Ng and Liu, 2000; Hegyi et al., 2002): (i) Corona¬ 
virus main proteinases employ conserved cysteine and 
histidine residues in the catalytic site. In TGEV M pro , these 
are Cysl44 and His41. There has been some debate on the 
existence of a third residue in the catalytic centre. In 
common with picornavirus 3C proteinases, the catalytic 
centre of the coronavirus M pro is predicted to be embedded 
in a chymotrypsin-like. two-p-barrel structure in which 
cysteine (rather than serine) serves as the principal 
nucleophile, (ii) Coronavirus main proteinases have 
well-defined substrate specificities. All known cleavage 
sites contain bulky hydrophobic residues (mainly leucine) 
at the P2 position, glutamine at the PI position, and small 


©European Molecular Biology Organization 


3213 



K.Anand et al. 


TGEV 

PIPV 

HCoV 

BCoV 

MHV 

IBV 


TGEV 

PIPV 

HCoV 

BCoV 

MHV 

IBV 


TGEV 

PIPV 

HCoV 

BCoV 

MHV 

IBV 


TGEV 

PIPV 

HCoV 

BCoV 

MHV 

IBV 


al 


bl 


cl 


dl 


el 


fl 


SGLRKMAQPSGLVEPCIVRVSYGNNVLNGLWLGDEVICPF 
SGLRKMAQPSGWE PC IVRVAYGNNVLNGLWLGDEVICPR 
AGLRKMAQPSGFVE KCWRVCYGNTVLNGLWLGDIVYCPR 
SGIVKMVNPTSKVE PCIVSVTYGNMTLNGLWLDDKVYCPF 
SGIVKMVSPTSKVEPCIVSVTYGNMTLNGLWLDDKVYCPF 
SGFKKLVSPSSAVE KCIVSVSYRGNNLNGLWLGDTIYCPE 
.*. *. *. ** *.* * * ******* **** 


/IAS -DTTRVINYENEMSSVRLHNFSVSKNN-VFLGWSARYKGVNLVLKVN 91 
/IAS - DTSRVINYENELSSVRLHNFSIAKNN-AFLGWSAKYKGVNLVLKVN 91 
/IAS-NTTSAIDYDHEYSIMRLHNFSIISGT-AFLGWGATMHGVTLKIKVS 91 
/I CSASDMTNPDYTNLLCRVTS SDFTVLFDR - LSLTVMSYQMQGCMLVLTVT 9 2 
/ICSSADMTDPDYPNLLCRVTSSDFCVMSGR-MSLTVMSYQMQGCQLVLTVT 92 
L.GK- - - FSGDQWNDVLNLANNHEFEVTTQHGVTLNWSRRLKGAVLILQTA 9 0 


all 


bll 


ell 


dll 


ell 


fll 


QVNPNTPEHKFKSIKAGESFNILACYEGCPGSVYGVNMRSQGTIKGSFIAG1 
QVNPNTPEHKFKSVR PGESFNILACYEGC PGS VYGVNMRSQGTIKGSFI AG' 
QTNMHTPRHSFRTLKSGEGFNILACYDGCAQGVFGVNMRTNWTIRGSFINGJ 
LQNSRTPKYTFGWKPGETFTVLAAYNGKPQGAFHVTMRSSYTIKGSFLCG: 
LQNPNTPKYSFGWKPGETFTVLAAYNGRPQGAFHWMRSSHTIKGSFLCGS 
VANAETP KYKFIKANCGDS FT IACA YGGT WGLY P VTMRSNGTIRASFLAGZ 


SVGYVLENGILYFVYKHHLELGNGSHVGSNFEGEMYGGY 184 
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SVGYVLTGDSVRFVYMHQLELSTGCHTGTDFSGNFYGPY 185 
SVGFNIEKGWNFFYMHHLELPNALHTGTDLMGEFYGGY 183 


B C D E 

EDQPSMQLEGTNVMSSDNWAFLYAALINGER-WFVTNTSMSLESYNTWAKTNSFTELSSTDAFSMLAAKTGQSVEKLLDSIVRLNKG 271 

EDQPSMQLEGTNVMSSDNWAFLYAALINGER-WFVTNTSMTLESYNAWAKTNSFTEIVSTDAFNMLAAKTGYSVEKLLECIVRLNKG 271 

EDQPNLQVESANQMLTVNWAFLYAAILNGCT-WWLKGEKLFVEHYNEWAQANGFTAMNGEDAFSILAAKTGVCVERLLHAIQVLNNG 271 

KDAQWQLPVQDYIQSVNFVAWLYAAILNNCN.WFVQSDKCSVEDFNVWALSNGFSQVKSDLVIDALASMTGVSLETLLAAIKRLKNG 272 

RDAQWQLPVQDYTQTVNWAWLYAAILNRCN-WFVQSDSCSLEEFNVWAMTNGFSSIKADLVLDALASMTGVTVEQVLAAIKRLHSG 272 

VDEEVAQRVPPDNLVTNNIVAWLYAAIISVKESSFSLPKWLESTTVSVDDYNKWAGDNGFTPFSTSTAITKLSAITGVDVCKLLRTIMVKNSQ 276 
* * ! s *.**:****:!. ii . . I! I* ** *.*: . •i *:t ** I I* * I. 


F 

FGGRTILSYGSLCDEFTPTEVIRQMYGVNLQ 302 
FGGRTILSYGSLCDEFTPTEVIRQMYGVNLQ 302 
FGGKQILGYSSLNDEFSINEWKQMFGVNLQ 302 
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Fig. 1. Sequence comparison of coronavirus main proteinases. The alignment was produced using CLUSTAL X, version 1.81 (Thompson et al., 1997), 
and corrected manually on the basis of the three-dimensional structure of TGEV M pro . The corresponding sequences of FIPV (strain 79-1146), HCoV 
(strain 229E), bovine coronavirus (BCoV, isolate LUN), MHV (strain JHM) and IBV (strain Beaudette) were derived from the replicative polyproteins 
of the respective viruses whose sequences are deposited at the DDBJ/EMBL/GenBank database (accession Nos: FIPV, AF326575; HCoV, X69721; 
BCoV, AF391542; MHV, M55148; IBV, M95169; TGEV, AJ271965). The P-strands and a-helices as revealed in the TGEV M pro crystal structure 
(this study) are shown above the sequence alignment (see also Figures 4 and 5). Black background colour indicates the catalytic cysteine and histidine 
residues. Grey background colour indicates the key residue of the SI subsite (TGEV M |)ln Hisl62) and its equivalents in other coronavirus main 
proteinases. Also shown in grey are the phenylalanine and tyrosine residues (TGEV M pr0 Phel39 and Tyrl60) that are proposed to stabilize the neutral 
state of Hisl62 (see text for details). 


aliphatic residues at the PI' position, (iii) Coronavirus 
main proteinases possess a large C-terminal domain of 
-110 amino acid residues that is not found in other RNA 
virus 3C-like proteinases. The characterization of recom¬ 
binant proteins, in which 33, 28 and 34 C-terminal amino 
acid residues were deleted from the IBV, MHV and HCoV 
main proteinases, respectively, resulted consistently in 
dramatic losses of proteolytic activity, suggesting that the 
C-terminal domain of M pro contributes to proteolytic 
activity through undefined mechanisms. 

The 1.96 A TGEV M pro crystal structure reported herein 
reveals the structural details of a unique catalytic system 
and facilitates the interpretation of previously published 
mutagenesis studies that have, at least in part, remained 
speculative due to the complete lack of structural inform¬ 
ation on ‘3C-like’ enzymes. 

Results and discussion 

Structure determination by MAD phasing 

The presence of 10 methionine residues in the TGEV M pro 
molecule suggested that selenomethionine-based multi¬ 


wavelength anomalous dispersion (MAD; Hendrickson 
et al., 1990) could be used to solve the phase problem. The 
unit cell dimensions of the crystals (a = 72.8 A ,b = 160.1 A, 
c = 88.9 A, (3 = 94.3°, space group P2 t ) and self-rotation 
calculations indicated the presence of as many as six 
TGEV M pro molecules per asymmetric unit. In the MAD 
phasing process, we finally succeeded in locating 48 (out 
of 60) crystallographically independent selenium sites by 
the ‘Shake & Bake’ approach to direct methods (Weeks 
and Miller, 1999), without recourse to heavy atom 
derivatives or other methods of phasing (see Materials 
and methods). The phases obtained resulted in a readily 
interpretable electron density map. 

Quality of the model 

All six copies (designated A-F) of the TGEV M pro in the 
asymmetric unit of the crystal could be built into well- 
defined electron density (Figure 2), which covered almost 
all of the 302 amino acid residues of each monomer. The 
only exceptions were the two C-terminal residues which 
were not visible in five of the six chains. Monomers A, E 
and F also lacked electron density for residue 300. 
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Fig. 2. Stereo view of a representative part of the electron density map. The 2IF„I - IF C I electron density map (1.96 A resolution, contoured at lo 
above the mean) corresponds to M I>m residues 160-162 (Tyr-Met-His), a conserved motif in coronavirus main proteinases. The strong hydrogen 
bonding interaction between the Tyrl60 hydroxyl group and Hisl62 N 51 is indicated. 


The final model comprises 1798 amino acid residues 
and 1006 water molecules, as well as 27 sulfate ions, 
nine dioxane molecules and six 2-methyl-2,4-pentanediol 
(MPD) molecules from the crystallization medium. The 
refinement converged to a final A’-factor of 0.210 and an 
7?free (Brtinger, 1992) of 0.256, with good stereochemistry. 
Altogether, 88.4% of the amino acid residues were found 
in the most favoured regions of the Ramachandran plot, 
and 10.8% were in additionally allowed regions. Residues 
Asn70, Asn71 and Ser279 were in regions only generously 
allowed, but had clear electron density. 

Domain structure 

The six TGEV M pro monomers present in the asymmetric 
unit are arranged in three dimers (Figure 3). Each 
monomer is folded into three domains, the first two of 
which are antiparallel [3-barrels reminiscent of those found 
in serine proteinases of the chymotrypsin family (Figure 4). 
Residues 8-100 form domain I, and residues 101-183 
make up domain II. The connection to the C-terminal 
domain III is formed by a long loop comprising residues 
184-199. Domain III (residues 200-302) contains a novel 
arrangement of five a-helices. A deep cleft between 
domains I and II, lined by hydrophobic residues, consti¬ 
tutes the substrate-binding site. The catalytic site is 
situated at the centre of the cleft. 

The interior of the [3-barrel of domain I consists entirely 
of hydrophobic residues. A short a-helix (helix A; 
Tyr53-Ser58) closes the barrel like a lid. Domain II is 
smaller than domain I and also smaller than the 
homologous domain II of chymotrypsin and hepatitis A 
virus (HAV) 3C pro (Tsukada and Blow, 1985; Allaire etctl., 
1994; Bergmann et al., 1997). Several secondary structure 
elements of HAV 3C pro (strands b2II and ell and the 
intervening loop) are missing in the TGEV M pro . Also, the 
domain II barrel of the TGEV M pro is far from perfect 
(Figure 4). The segment from Glyl35 to Serl46 forms a 
part of the barrel, even though it consists mostly of 


consecutive loops and turns. In fact, in contrast to domain 
I, a structural alignment of domain II has proven difficult. 
The superposition of domains I and II of the TGEV M pro 
onto those of the HAV 3C pro yields an r.m.s.d. of 
1.85 ± 0.05 A for 114 equivalent (out of 184 compared) 
C a pairs, while domain II alone displays an r.m.s.d. of 
3.25 ± 0.28 A for 57 (out of 85) C a pairs. 

Domain III is composed of five, mostly antiparallel, 
a-helices and the loops connecting them. The crossover 
angles are -90° between helices B and E, -30° between B 
and D, -20° between C and E, and -80° between E and F, 
whereas C-B and B-F are parallel to each other (see 
Figure 5). Interhelical contacts are mediated by hydro- 
phobic side chains. The loops between the helices are quite 
long and fill up most of the interstitial space of domain III. 
Database searches (Holm and Sander, 1993; Gilbert et al., 
1999) did not reveal other proteins or protein domains with 
the same topology as domain III. The N-terminal segment 
(residues 1-5) of the polypeptide chain folds onto domain 
III, placing the N-terminus of the protein within 17.0 
(±2.7) A of the C-terminus (Figure 4). 

The six copies of the TGEV M pro in the asymmetric unit 
of the crystal are highly similar. The core regions of 
domains I and II display an r.m.s.d. of 0.29 (±0.09) A for 
130 equivalent C a atoms (monomer A as a reference; 
herein, geometrical values given are the r.m.s. over the six 
monomers, with the corresponding standard deviation). If 
all 299 well-determined C a positions are included, the 
average r.m.s.d. for all monomers is 0.57 (±0.18) A. The 
largest deviations of the main chain trace are in: (i) the 
N-terminal segment from residues 1 to 4 (average r.m.s.d. 
1.69 ± 0.91 A); (ii) the flexible surface loop from residues 
216 to 225 (average r.m.s.d. 0.99 ± 0.51 A); (iii) the 
C-terminus of helix E and the loop region between 
residues 267 and 276 (average r.m.s.d. 0.99 ± 0.42 A); and 
(iv) the segment 294-300 following the C-terminal F helix 
(average r.m.s.d. 1.55 ± 0.44 A). In addition to being 
flexible and at the surface of the molecules, segments 
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Fig. 3. Stereo depiction of the six molecules (three dimers) of TGEV M pro in the asymmetric unit. The monomers A-F are shown in different colours; 
A = red, B = black, C = green, D = orange-red, E = yellow and F = cyan. Note the 2-fold symmetry axes between the monomers in each of the dimers, 
and between the two lower dimers in the figure (AB and EF). Each of the monomers measures ~70 A X 22 A X 40 A. 


(ii) and (iii) are involved in interdimer crystal contacts in 
some but not all of the six protomers. Surprisingly, the 
regions with the highest r.m.s.d. are not the regions with 
the highest temperature factors, except for the C-terminal 
domain of monomer F which does have high temperature 
factors (~70 A 2 ; whole model 47 A 2 , including all 1006 
water molecules). 


Active site 

The active site of the coronavirus M pro is similar to those of 
the picornavirus 3C proteinases, as had been predicted 
earlier (Gorbalenya et al ., 1989b). The mutual arrangement 
of the nucleophilic Cysl44 and the general acid-base 
catalyst His41 of TGEV M pro is identical to that of the 
HAV 3C pro Cysl72 and His44 residues and the Serl95 and 
His57 residues of chymotrypsin. The distance between the 
sulfur atom of Cysl44 and the N e2 of His41 is 4.05 
(±0.04) A, i.e. longer than the corresponding cysteine- 
histidine distances in HAV 3C pro (3.92 A; Bergmann 
etcil., 1997), poliovirus (PV) 3C pro (3.4 A; Mosimann etal., 
1997) and papain (3.65 A; Kamphuis et al., 1984) 
(Figure 6 B and C). In contrast to papain, but in agreement 
with the picornavirus 3C proteinases, the sulfur atom 
is in the plane of the histidine imidazole. There are 
clear indications from the difference Fourier synthesis 
(Figure 6 A) that Cysl44 is oxidized, at least to the stage of 
the sulfinic acid, -SCV , and probably to the sulfonic acid, 
-SO 3 , in all six copies of TGEV M pro in the crystal. Such 
oxidation could occur during the time required for 
crystallization or during X-ray data collection, and would 
lead to inactivation of the enzyme. Refinement of the 
corresponding derivatives was, however, not successful. 


It is generally assumed that the native state of the active 
site of papain-like cysteine proteinases is a thiolate- 
imidazolium ion pair formed by cysteine and histidine 
residues (Polgar, 1974). In proteinases of the papain 
family, an asparagine is the third member of the catalytic 
triad. Chymotrypsin and other members of this serine 
proteinase family have a catalytic triad consisting of 
Serl95...His57...Aspl02. In HAV 3C pro , Asp84 is present 
at the required position, although its side chain points 
away from His44, making its role disputable (Malcolm, 
1995; Bergmann et al., 1997). PV 3C pro , human rhinovirus 
(HRV) 3C pro and HRV 2A pro have a glutamate or aspartate 
in the proper orientation to accept a hydrogen bond from 
the active site histidine (Matthews et al., 1994; Mosimann 
et al., 1997; Petersen et al., 1999). In contrast, TGEV M pro 
has Val84 in the corresponding position, with its side chain 
pointing away from the catalytic site (Figure 6 B and C). A 
buried water molecule is found in the place that normally 
would be occupied by the side chain of the third member 
of the catalytic triad. This water molecule makes hydrogen 
bonds to His41 N 81 , Hisl63 N 81 and Aspl 86 O 81 
(Figure 6 B). Hisl63 is not conserved among coronavirus 
main proteinases and its substitution by leucine (M pro - 
H163L) had no significant effect on the proteolytic activity 
in the standard peptide assay (see Materials and methods), 
as compared with the activity of the wild-type M pro 
(Table I). Aspl 86 makes a salt bridge to Arg40 that 
appears to be required to maintain the active site geometry, 
since both Asp 186 and Arg40 are absolutely conserved 
among coronaviruses. Through this (and other) inter¬ 
actions), the polypeptide segment 184-199, which con¬ 
nects domains II and III and is probably involved in 
substrate binding (see below), is held in the proper 
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Domain III 


Fig. 5. Topological representation of the secondary structure elements 
of a TGEV M pro monomer, a-helices and p-strands are represented as 
cylinders and arrows, respectively. Numbers indicate the N- and 
C-terminal residues of the secondary structure elements. Strands bl and 
cl are adjacent. Cysl44 (yellow) and His41 (blue) are shown by circles. 
The positions of the N- and C-termini are indicated. Also, the presumed 
localization of the P5-P1 region of a model substrate is shown (blue) 
(for details, see text and Figure 7). 


Fig. 4. A MOLSCRIPT diagram (Kraulis, 1991) showing the overall 
fold of TGEV M pro (A) with the two p-barrel domains and the a-helical 
C-terminal domain, p-strands and helices are represented as arrows and 
cylinders, respectively. The p-barrels of each domain I and II are com¬ 
posed of six-stranded p-sheets (green). Domain III is composed mainly 
of a-helices (red). The structures of HAV 3C pr0 (PDB code: 1HAV) (B) 
and a-chymotrypsin (4CHA, residues 12-15 and 147-148 are 
excised) (C) are shown for comparison. 


position. Taken together, the data contradict a direct 
involvement of Hisl63 or Aspl86 in catalysis, making the 
TGEV M pro a clear case of a viral cysteine proteinase 
employing only a catalytic dyad. 

Substrate hydrolysis by cysteine and serine proteinases 
occurs through a covalent tetrahedral intermediate result¬ 
ing from attack of the active site nucleophile on the 
carbonyl carbon of the scissile bond. The developing 
oxyanion is stabilized by strong hydrogen bonds donated 
by amide groups of the enzyme. This so-called ‘oxyanion 
hole’ is also found in TGEV M pro . It is made up by the 
main chain amides of Glyl42, Thrl43 and Cysl44 
(Figure 6B). 


Substrate-binding site 

The specificity of M pro for a very limited range of amino 
acids at the PI, P2 and P4 positions resembles the substrate 
specificity of picornavirus 3C proteinases (Palmenberg, 
1990; Ziebuhr et al., 2000). This leads us to believe that, 
similarly to 3C pro (Matthews et al., 1994; Bergmann et al., 
1997; Mosimann et al., 1997), specific substrate binding 
by M pro is ensured by well-defined S4, S2 and SI 
specificity pockets. In order to visualize potential inter¬ 
actions with the substrate, we have modelled a pentapep- 
tide representing the P5-P1 residues of a TGEV M pro 
cleavage site (Asn-Ser-Thr-Leu-Gln, ppla amino acids 
2874-2878; Hegyi and Ziebuhr, 2002) into the substrate¬ 
binding cleft of M pro (Figure 7). The model is based on 
the assumption that M pro binds substrates in a manner 
analogous to that found in complexes of chymotrypsin-like 
proteinases with peptide inhibitors. X-ray structures have 
shown that the P4-P1 residues of peptide inhibitors 
assume a common main chain conformation when bound 
to these proteinases, with the P4 and P3 residues adopting 
a (3 conformation and the P2 and PI residues assuming a 
specific main-chain conformation suitable to place their 
side chains in the pre-formed S1 and S2 specificity pockets 



3217 









K.Anand et at. 



Fig. 6. Active site of the TGEV MP ro . (A) Difference electron density (I F 0 \ - IF C I at 3.0a above the mean; red) for the oxidized active site Cysl44, indi¬ 
cating three oxygen atoms bound to the sulfur. (B) The catalytic Cysl44 and His41 residues are shown. The region forming the oxyanion hole (main 
chain amides of Glyl42, Thrl43 and Cysl44) is highlighted in pink. The water molecule, which occupies a position equivalent to that of the catalytic 
aspartate of serine proteinases, is shown together with its hydrogen-bonding partners, His41, His 163 and Asp 186. (C) Superposition of the active site 
residues of chymotrypsin (shown in red) with the spatially equivalent residues of TGEV MP ro (blue) and HAV 3C pro (green). The equivalent to the 
third catalytic residue (Asp 102) of chymotrypsin is Asp84 in HAV 3C pro (side chain oriented differently) and Val84 in TGEV MP ro . 


Tabic I. Enzymatic activities of TGEV MP ro mutants 




Plasmid 

Oligonucleotides used for cloning or mutagenesis (5'—>3') 

Protein 

MP ro amino 
acids 

Activity 

(%r 

pMal-MP ro 

TCAGGTTTGCGGAAAATGGCAC, 

AAAAGGATCCTTACTGAAGATTTACACCATACATTTG 

MP” 

Serl-Gln302 

100 

pMal-MP r “A 184-302 

TCAGGTTTGCGGAAAATGGCAC, 

AAAGGATCCTTAACCACCGTACATTTCTCCTTCAAAATT 

Mp io A1 84-302 

Serl-Glyl83 

<0.02 

pMal-MP ro A200-302 

TCAGGTTTGCGGAAAATGGCAC, 

AAAGGATCCTTATGACATGACATTAGTACCTTCCAATTG 

MP™A200-302 

Serl-Serl99 

0.4 

pMal-MP ro A 1-5/A200-302 

ATGGCACAGCCTAGTGGTCTTGTA, 

AAAGGATCCTTATGACATGACATTAGTACCTTCCAATTG 

MP ro A 1-5/A200-302 

Met6-Serl99 

0.6 

pMal-M pro Al-5 

ATGGCACAGCCTAGTGGTCTTGTA, 

AAAAGGATCCTTACTGAAGATTTACACCATACATTTG 

MP‘°Al-5 

Met6-Gln302 

0.3 

pMal-M pro -H 163L 

GTATACATGCATCTCTTAGAACTTGGAAATGGCTCGCAT, 
TCCAAGTTCT AAGAGATGC ATGT AT AC AAAATAGAGAAT 

M pro -H163L 

Serl-Gln302 
(Hisl63—>Leu) 

98 

pMal-MP ro -C 144 A 

AGCTGGTACTGCTGGATCAGTAGGTTATGTGTTAGAA, 

CTACTGATCCAGCAGTACCAGCTATAAAAGATCCTTT 

MP”-C144A 

Serl-Gln302 
(Cysl44—>Ala) 

<0.02 


The sequence of the 15mer substrate peptide, H 2 N-VSVNSTLQSGLRKMA-COOH, was derived from the N-terminal MP ro autoprocessing site 
(residues shown in bold indicate the scissile bond). The activity of wild-type MP ro (encompassing 302 residues) was taken as 100% and the mean 
value of three experiments, which did not vary by more than 15%, is shown. 

“Proteolytic activities were determined using a peptide-based cleavage assay (Ziebuhr et cil., 1997; see Materials and methods). 


(James et cil., 1980; Fujinaga et al., 1985, 1987, Matthews 
et al., 1999). These studies lead us to suggest that the 
residues P5 to P3 of M pro substrates may form an 
antiparallel (3-sheet with segment 164-167 of the long 
strand ell on one side, and with the segment 186-191 
(which links domains II and III) on the other. Hydrogen 
bonding interactions are likely between the main chain 
amide and carbonyl oxygen atoms of substrate residues 
Thr(P3), Ser(P4) and Asn(P5) and the main chain atoms of 
TGEV M pro residues Glul65, Serl89 and Glyl67 (see 
Figure 7). 


SI subsite 

It has been shown for the HAV, HRV and PV 3C pro 
enzymes that the imidazole side chain of a conserved 
histidine, which is located in the centre of a hydrophobic 
pocket, interacts with the PI carboxamide side chain of the 
substrate. This interaction is generally accepted to deter¬ 
mine the picornavirus 3C pro specificity for glutamine at PI 
(Matthews et al., 1994, 1999; Bergmann et al., 1997; 
Mosimann et al., 1997). Mutational analyses revealed that 
any replacement of His 162 completely abolished the 
proteolytic activities of the HCoV and feline infectious 
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Fig. 7. Stereo diagram of a P5-P1 substrate (Asn-Ser-Thr-Leu-Gln, red; corresponding to the TGEV MP ro N-terminal autoprocessing site) modelled 
into the active site cleft of the TGEV MP ro . Hydrogen bonds are depicted by dotted lines. 


peritonitis virus (FIPV) M pro enzymes (Ziebuhr et al., 
1997; Hegyi et al., 2002). The structure shows that the 
imidazole side chain of His 162 is positioned suitably to 
interact with a PI glutamine side chain. His 162 is located 
at the very bottom of a hydrophobic pocket which is 
formed by residues Phel39 and the main-chain atoms of 
lie 140, Leu 164, Glul65 and His 171. The side chain of 
Glul65 forms an ion pair (2.96 ± 0.14 A) with His 171. 
This salt bridge is itself on the periphery of the molecule, 
forming part of the ‘outer wall’ of the SI subsite. 
Accordingly, mutants of the HCoV 229E M pro , in which 
the residue equivalent to His 171 had been replaced by 
alanine, serine or threonine, retained significant proteo¬ 
lytic activities (Ziebuhr et al., 1997). In order to interact 
with the PI glutamine side chain of the substrate. His 162 
has to maintain a neutral state over a wide pH range. Most 
probably, this is achieved by two important interactions: 
(i) stacking onto the phenyl ring of Phel39, at a distance of 
3.53 ± 0.18 A; and (ii) accepting a hydrogen bond from 
the buried Tyrl60 hydroxyl group which has no other 
hydrogen-bonding partner. The role proposed for the 
hydroxyl group of Tyrl60 is strongly supported by FIPV 
M pro mutagenesis studies in which the proteolytic activi¬ 
ties of Y160F, Y160G, Y160A and Y160T mutants were 
shown to be dramatically reduced (Hegyi et al., 2002). 
Tyrl60 is part of the absolutely conserved coronavirus 
M pro sequence signature, 160 Tyr-X-His 162 (Figures 1 and 
2), whereas Gly(Ala)-X-His is found at the equivalent 
sequence position in most 3C and 3C-like proteinases 
(Gorbalenya et al., 1989a). Accordingly, in the 3C and 3C- 
like proteinases, stabilization of histidine in the neutral 
tautomeric state has to be ensured by other residues. 
Notably, in the case of PV 3C pro , this involves a tyrosine 
residue (Tyrl38) which, however, is provided by a 
different part of the structure ((3-strand ell; Mosimann 
et al., 1997). For HAV 3C pro , other mechanisms are 
proposed (Bergmann et al., 1997). 


Halfway down the SI subsite of TGEV M pro , there is 
dumbbell-shaped electron density which we have assigned 
to two water molecules, although theoretically they are too 
close to one another (2.10 ± 0.16 A). One of them makes a 
hydrogen bond with N e2 of His 162, while the second one, 
unusually for water, makes no additional contacts. In our 
model of the substrate complex, these two water molecules 
mark the position of the carboxamide group of the PI 
glutamine side chain. 

S2 subsite 

Coronavirus main proteinases have a strong preference for 
leucine at the P2 position (Ziebuhr et al., 2000). The 
putative S2 subsite identified in the structure is a 
hydrophobic pocket that is suitably positioned and large 
enough to accommodate a leucine side chain easily. The 
S2 pocket is lined by the side chains of Leu 164 (the main 
chain of which forms part of the SI subsite, see above), 
Prol88, Ile51, His41 and Thr47 (Figure 7). In our electron 
density maps, part of the S2 subsite (of all six copies of the 
monomer) harbours extra electron density that we inter¬ 
preted as an MPD molecule from the crystallization 
medium. In the HAV 3C pro , the corresponding subsite is 
formed by different parts of the polypeptide chain. It is 
also smaller and can accommodate the side chains of 
serine and threonine (Bergmann et al., 1997). 

Quaternary structure 

The quaternary arrangement of the proteinase is a 
homodimer, with three copies in the asymmetric unit 
(monomers A and B, C and D, and E and F). All dimers 
have approximate C 2 symmetry (Figure 3) and -1580 
(±199) A 2 of each monomer, i.e. 11-12% of its solvent- 
accessible surface, are buried upon dimerization. The 
dimer formation is driven mainly by intermolecular 
interactions between domains II and III of one monomer 
and the N-terminal residues of the other (see below for 
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Fig. 8. Intra- and intermolecular contacts of the TGEV MP ro N-terminus. (A) MOLSCRIPT stereo representation of a TGEV MP ro dimer. Molecule A 
is coloured from blue at the N-terminus, via green (domain II), to red (C-terminus), while molecule B is shown in grey. The catalytic Cysl44 and 
His41 residues are labelled in both monomers. (B) Detailed view of the interactions made by the N-terminal segment (blue) and domains II/III of 
monomer A as well as domains II/III of monomer B. Residues critically involved in these interactions are designated by the single-letter code and 
shown in ball-and-stick representation (see text for details). The N- and C-termini of molecule A are indicated. 


further details). In contrast, the domain Ill-domain III 
interface appears to be the consequence rather than the 
cause of other intermolecular interactions. It involves a 
relatively small area of 337 ± 45 A 2 and comprises only 
two hydrogen bonds, between the amide group of 
Gly281 (molecule A) and the main-chain oxygen of 
Ser279 (molecule B), as well as its symmetry mate, 
Gly281B...Ser279A (3.22 ± 0.37 A, averaged over all six 
monomers). 

Interestingly, the N-terminal residues of each monomer 
are relatively close to the substrate-binding site of the 
other monomer in the dimer. The following observations 
for monomer A hold true for all other monomers. The 
NH 3 + group of SerlA, which is the PI' residue of the 
autocleavage reaction of TGEV M pro , is 11.9 ± 1.6 A from 
the active site Cysl44B S Y of the second molecule in the 
dimer but as much as 34.2 ± 0.9 A away from its own 
active site cysteine. SerlA is in contact with residues 
participating in the substrate-binding site of monomer B. 
Its NH 3 + group makes a salt bridge (4.99 ± 1.04 A) to the 
carboxylate of Glul65B (Figure 8). This glutamate, which 
is absolutely conserved among coronaviruses, is part of the 


SI subsite (see above), where it also interacts with His 171. 
Although these two side chains form the ‘wall’ of the 
specificity site, they have their polar groups oriented 
towards the surface of the proteinase molecule and away 
from the substrate’s PI glutamine. An intermolecular ionic 
interaction between Arg4A and Glu286B (6.0 ± 0.7 A) 
appears to play a role in positioning the N-terminal 
residues. Because of the 2-fold non-crystallographic 
symmetry (NCS), the same interaction occurs between 
Arg4B and Glu286A. Residues 6A-8A form a short 
(3-strand interacting with strand ell of monomer B (at 
Vall24B). Most of the interactions between the 
N-terminus of molecule A and the region next to the SI 
subsite of molecule B constitute a perfect fit. Given the 
fact that the P' residues in serine and cysteine proteinases 
constitute the leaving group of the cleavage reaction and, 
in coronavirus main proteinases, are not subject to 
stringent specificity requirements, it is quite conceivable 
that, after autoproteolysis, the N-terminus of one monomer 
slides over the active site of the partner monomer and 
adopts the position seen in our crystal structure, i.e. with 
SerlA interacting with Glul65B at the ‘outer wall’ of the 
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SI subsite. This, in turn, would suggest that the dimer we 
are seeing corresponds to the product of the autolysis 
reaction and that this occurs in tram. Molecular modelling 
revealed that binding of the M pro N-terminus in the active 
site cleft of the same molecule would require remodelling 
of the entire N-terminal segment and beyond (residues 
1-13; data not shown), making cleavage in cis less likely. 
There is additional experimental evidence supporting 
these conclusions. First, dilution experiments with MHV 
M pro translated in vitro contradict cfs-cleavage activity (Lu 
et al., 1996). Secondly, the fact that, early in infection, 
M pro remains part of a relatively stable 150 kDa precursor 
protein in which it is flanked by hydrophobic domains 
(Schiller el al., 1998) argues against rapid autoprocessing 
in cis. The proposed model of intermolecular self- 
processing would imply that components of the replication 
complex could first be anchored to membranes (i.e. the site 
of RNA replication) in an uncleaved form, and only later, 
when the precursor proteins accumulate to high local 
concentrations, will M pro release itself by intermolecular 
cleavage, thereby triggering the complete spectrum of 
/ran,v-processing reactions. 

Intramolecular interactions of the N-terminus 

A specific conformation of the N-terminal segment allows 
it to ‘squeeze’ residues 1-8 in between domains II and III 
of the same monomer and domains II and III of monomer 
B (see above and Figure 8). In this context, the N-terminus 
also interacts with domains II and III of its own protomer. 
For example, the side-chain amino group of Lys5A makes 
strong intramolecular hydrogen bonds with Seri 10A O 7 of 
domain II (2.83 ± 0.15 A), and with the Glu286A main 
chain oxygen (2.80 ± 0.07 A), as well as with Glu291A 
0 El (2.74 ± 0.13 A) of domain III. Furthermore, the side 
chain of Leu3A completes a hydrophobic patch on domain 
III which includes Phe206A, Ala209A, Phe287A, 
Val292A, the C p atom of Gln295A and Met296A; 
these residues belong to helices B and F. All sequenced 
members of the coronavirus proteinase family have a 
hydrophobic residue in position 3, while glycine is 
absolutely conserved in position 2 (see Figure 1). The 
latter residue adopts the a L conformation which is easily 
accessible only to glycine. To investigate the functional 
significance of these interactions, a recombinant protein, 
M pro Al-5, in which the N-terminal residues Serl-Lys5 
were removed from the M pro sequence, was expressed and 
tested for proteolytic activity in a trans- cleavage assay 
using a 15mer peptide representing the N-terminal TGEV 
M pro autoprocessing site. As shown in Table I, the activity 
of M pro Al-5 was decreased to only 0.3% of the M pro 
activity. We conclude from these data that, indeed, 
residues 1-5 may be critically involved in stabilizing the 
mutual orientation of domains II and III and thus, 
indirectly, in maintaining the proper orientation of the 
intervening loop region (residues 184-199). If this 
hypothesis is correct, then the deletion of domain III 
should have similarly detrimental effects on the proteo¬ 
lytic activity and, in fact, the published data (see 
Introduction) seem to support this conclusion. To cor¬ 
roborate this hypothesis further, an additional set of M pro 
mutants was characterized in which we used the structural 
information to remove domain III completely. In this 
approach, the probability of domain III misfolding, which 


might have been the cause of M pro inactivation in previous 
studies using randomly ‘truncated’ coronavirus main 
proteinases (Lu and Denison, 1997; Ziebuhr et al., 1997; 
Ng and Liu, 2000), should be significantly reduced. The 
TGEV M pro deletion mutants tested for activity comprised 
(i) domains I and II (M pro A184-302); (ii) domains I and II 
together with the entire loop region (M pro A200-302); or 
(iii) domains I and II combined with the loop region but 
lacking the five N-terminal residues (M pro Al-5/A200- 
302). As Table I shows, M pro A200-302 had clearly 
detectable (albeit significantly reduced) activity (0.4% of 
M pro ). Similarly, the mutant M pro Al-5/A200-302 had 
significantly reduced activity (0.6% of M pro ). In sharp 
contrast, no activities were detectable for M pro A 184-302 
and the active site mutant, M pro -C144A (the latter being 
used as a negative control). The fact that residues 184-199 
proved to be indispensable for proteolytic activity supports 
our model of substrate binding (Figure 7) in which 
residues of the loop are predicted to be critically involved 
in the formation of a (3-sheet-type structure with the 
substrate (see above). The data also show that an intact 
N-terminus and the C-terminal domain are required for full 
activity. The structure suggests that the additional a- 
helical domain III as well as the N-terminal residues help 
fix domains II and the loop 184-199 in a catalytically 
competent orientation. It will be interesting to investigate 
whether similar mechanisms are also operating in other 
3C-like proteinases with (smaller) C-terminal domains 
(e.g. arteriviruses and potyviruses; Ziebuhr et al., 2000; 
Hegyi et al., 2002). 

Beyond its presumed role in proteolytic activity, domain 
III may have other functions, which remain to be 
determined. In contrast to picornavirus 3C proteinases 
for which RNA-binding activities are well established 
(Andino et al., 1993; Leong et al., 1993; Xiang et al., 
1995), the M pro structure does not support such an activity 
for the coronavirus main proteinase. Thus, calculation of 
the electrostatic potential (Nicholls et al., 1991) does not 
reveal an overall basic character of domain III, nor are 
there distinct patches of basic or aromatic residues (data 
not shown). The same applies to domains I and II. Also, 
the conserved picornavirus sequence motif, KFRDI, 
located between domains I and II, as well as the small 
helices and reverse turns that together form the RNA- 
binding site of HAV 3C pro (Bergmann et al., 1997) are 
missing in the TGEV M pro structure. 

Conclusion 

The crystal structure of TGEV M pro shows that corona- 
viruses have evolved proteinases in which a thiolate- 
imidazolium catalytic dyad has been combined with a 
two-(3-barrel fold. This framework is extended further by a 
novel a-helical domain that, together with the N-terminal 
residues 1-5, appears to be involved in proteolytic activity 
by maintaining the proper positioning of the presumed 
substrate-binding loop, 184-199. We are confident that the 
first crystal structure of a non-picornaviral chymotrypsin- 
like cysteine proteinase will facilitate further molecular 
modelling of other members of the huge family of RNA 
viral ‘3C-like’ enzymes for which structural information is 
still lacking. 
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Table II. Summary of X-ray diffraction data from crystals of native and SeMet-substituted MP ro 


Beamline 

Data set e 

XRD“ 

Peak 



Edge 


High 


Low 

BW7A b 








Native 

PI 

P2 

P3 

El 

E2 

HI 

H2 

Ll 

Wavelength (A) d 

0.99983 

0.97487 

0.97845 

0.97848 

0.97864 

0.97874 

0.95583 

0.9080 

1.0022 

Resolution (A) (highest resolution bin) c 

50-1.95 (1.98-1.95) 

30-2.8 

30-2.8 

30-2.8 

30-2.8 

30-2.8 

30-2.8 

30-2.8 

30-2.8 

Completeness (%) c 

98.9 (97.0) 

99.9 

98.1 

99.7 

99.9 

99.7 

99.7 

98.8 

97.3 

Mosaicity (°) 

0.62 

0.4 

0.6 

0.7 

0.4 

0.6 

0.4 

0.6 

0.4 

Emerge (%) CS 

4.2 (22.1) 

10.5 

11.4 

10.6 

8.1 

8.2 

8.6 

7.2 

8.0 

Run, (%r- s 

4.6 (27.1) 

12.1 

13.0 

12.3 

9.2 

8.9 

10.2 

7.5 

10.3 

«pim (%r- h 

1.8 (15.2) 

6.1 

6.6 

6.4 

4.7 

4.5 

5.2 

3.2 

5.4 

Redundancy 0 

5.4 (2.9) 

3.8 

3.8 

3.9 

3.8 

3.9 

3.7 

3.6 

2.9 

1/0(1)° 

13.5 (4.0) 

5.4 

4.7 

4.8 

6.1 

4.1 

4.1 

4.9 

2.5 


a X-ray diffraction beamline at ELETTRA, Trieste, equipped with a Mar CCD detector. 
b Wiggler beamline of EMBL at DESY, Hamburg, equipped with a Mar CCD detector. 
c Highest resolution bin in parentheses. 

d The inflection point and peak wavelengths were collected in inverse beam mode, whereas the remote wavelengths were collected at the low energy 
side of the Se edge where there is little anomalous signal and, as a result, no inverse beam data were collected. 

e Pl, P2, P3 = peak wavelengths 1, 2 and 3; El, E2 = edge wavelengths 1 and 2 (point of inflection); HI, H2 = high energy remote wavelengths 1 and 2; 
LI = low energy remote wavelength. 

f R merge = 100 X XjZi^il/i — ^/^l/ZjZ kk ]7i, where /j is the observed intensity and is the average intensity from multiple measurements. 

g R rim = 100 x X ; ( N/N - l) 1/2 Z hkl l/i - </>l/X;X hkl /j, where N is the number of times a given reflection has been measured. This quality indicator 

corresponds to an R sym that is independent of the redundancy of the measurements. 

h R p im = 100 X Xj (1 /N - l) 1/2 X hkl l/i - </>l/X i X hk i/ i . This factor provides information about the average precision of the data. 


Materials and methods 

Protein purification and crystallization 

Recombinant TGEV M pro was expressed and purified as previously 
described for the HCoV and FIPV main proteinases (Ziebuhr et al ., 1997; 
Hegyi et al ., 2002). Briefly, the coding sequence of the TGEV M pro was 
inserted into the Xmnl and BamHI sites of pMal-c2 plasmid DNA (New 
England Biolabs). The resulting plasmid, pMal-M pro , was used to 
transform Escherichia coli TB1 cells. The maltose-binding protein 
(MBP)-TGEV M pro fusion protein was purified by amylose-agarose 
chromatography, cleaved with factor Xa, and the recombinant M pro 
(residues Serl-Gln302) was purified by hydrophobic interaction, anion 
exchange and size exclusion chromatography (Hegyi et al., 2002). The 
purified and concentrated TGEV M pro (12.5 mg/ml) was stored in 12 mM 
Tris-HCl pH 7.5, 120 mM NaCl, 1 mM dithiothreitol (DTT), 0.1 mM 
EDTA. This protein solution was used to crystallize M pro by the hanging 
drop vapour diffusion method at 4°C. The best crystals, which were of 
triangular shape and had dimensions of ~0.3 X 0.25 X 0.3 mm, were 
obtained by using 100 mM HEPES pH 8.8, 1.8 M ammonium sulfate, 6% 
MPD, 5 mM DTT and 4% dioxane as the reservoir and grew in -10 days. 

Incorporation of selenomethionine 

The M pro structure could not be solved using conventional molecular 
replacement techniques. Therefore, selenomethionine (SeMet)-substi- 
tuted TGEV M pro was produced. The coding sequence of the MBP-TGEV 
M pr ° fusion protein was inserted into pET-1 Id (Novagen), and the 
resulting plasmid, pET-TGEV-M pro , was used to transform the 
methionine-auxotrophic 834(DE3) E.coli strain (Novagen), which was 
propagated in minimal medium containing 40 pg/ml seleno-L- 
methionine. The SeMet-substituted TGEV M pro was purified as described 
above and concentrated to 9.5 mg/ml. Crystals of the SeMet-substituted 
M pro were grown as decribed for the native protein but using 2 M 
ammonium sulfate and 8% MPD. 

Diffraction data collection 

Crystals used for data collection were rinsed with mustard oil and cryo- 
cooled in liquid nitrogen. Diffraction data up to 1.95 A resolution were 
collected from native crystals at 100 K on the X-ray diffraction beamline 
at ELETTRA (Sincrotrone Trieste, Trieste, Italy), using a Marl65 CCD 
detector (Table II). MAD data sets were collected to 2.8 A resolution at 
four wavelengths using a Mar 165 CCD detector on beamline BW7A of 
the EMBL Outstation at DESY (Hamburg, Germany). SeMet data sets 
were collected for the /' maximum and / minimum wavelengths. 
Additional data were collected at remote wavelengths below and above 


the Se K-edge (Table II). Data integration and scaling were performed 
using DENZO and SCALEPACK (Otwinowski and Minor, 1997). 


Structure determination 

The unit cell dimensions, as well as the self-rotation function (ALMN; 
CCP4, 1994), implied that several monomers were present in the 
asymmetric unit. A Matthews coefficient (Matthews, 1968) of 2.3 A 3 /Da 
and a solvent content of 51% were obtained assuming six molecules in the 
asymmetric unit. The bottleneck of the structure determination was the 
identification of the 60 selenium positions (six monomers with 10 Se 
each). Solving the problem by SnB v2.0 (Weeks and Miller, 1999) 
required data of increased precision, which were obtained by averaging of 
several data sets and monitoring the process by R p j m (Weiss and 
Hilgenfeld, 1997). Only after we had combined three merged peak- 
wavelength data sets with two merged edge-wavelength data sets 
(redundancy =18) were we able to obtain 105 solutions (from 5000 
trials) with significantly reduced minimal function values (R min = 0.49, 
CC = 0.51; Hauptman, 1991) (details to be published elsewhere). The 
positions of the best 60 atom solutions from SnB were examined for NCS. 
In total, 37 positions were found to obey a 2-fold NCS. This symmetry 
predicted a further 11 positions. All 48 positions were used in MLPHARE 
(CCP4, 1994) for phasing, followed by solvent flattening and NCS 
averaging in DM (Cowtan and Main, 1996). The resulting electron 
density maps were of sufficient quality for chain tracing. The first 
monomer was built manually into the experimental electron density map, 
using the program ‘O’ (Jones et al., 1991). All other monomers were 
generated by NCS. NCS restraints were applied during the initial stages of 
refinement at low resolution and later gradually released as the resolution 
limit was extended to 1.96 A. 

Cycles of adjustments to the model with O and subsequent refinement 
using the program CNS (Briinger et al., 1998) converged to an Rf ree of 
0.256 and a crystallographic /^-factor of 0.210. Data quality and 
refinement statistics are given in Table III. The quality of the structural 
model and its agreement with the structure factors were checked with 
programs PROCHECK (Laskowski et al., 1993), WHATCHECK 
(Vriend, 1990) and SFCHECK (Vaguine et al., 1999). Solvent 
accessibility was calculated using the algorithm of Lee and Richards 
(1971; program NACCESS), using a solvent probe of radius 1.4 A. The 
molecular diagrams were drawn using MOLSCRIPT (Kraulis, 1991) and 
rendered with RASTER 3D (Bacon and Anderson, 1988). Atomic 
coordinates and structure factors have been submitted to the RCSB 
Protein Data Bank under accession code 1LVO. 
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Table III. Phasing statistics, refinement statistics and model quality 

Phasing 

FOM a before solvent flattening 

0.48 

FOM a after solvent flattening (no averaging) 

0.72 

FOM a after solvent flattening (with averaging) 

0.79 

Refinement 

Resolution (A) 

50-1.96 

F-factor b 

0.210 

^free 

0.256 

No. of non-hydrogen atoms [average F-value (A 2 )] 

Protein (main chain) 

7198 (46.1) 

Protein (side chain) 

6613 (47.2) 

Water 

1006 (50.3) 

MPD 

48 (67.6) 

Sulfate 

135 (57.1) 

Dioxane 

54 (71.7) 

R.m.s. deviation from ideal geometry 

Bonds (A) 

0.017 

Angles (°) 

1.9 

Improper dihedral angles (°) 

1.16 


a FOM = figure of merit. 

b F-factor = Z (IF 0 I - k\F c \yZ\F 0 \, where k is the scale factor. 


Proteolytic activities of TGEV MP ro mutants 

For the expression of M pro proteins with N- and C-terminal deletions 
(MP ro Al84-302, M pro A200-302, M pro Al-5 and M pro Al-5/A200-302), the 
corresponding M pro coding sequences were amplified by PCR and 
inserted into Xmnl-itaraHI-digested pMal-c2 plasmid DNA. To substitute 
the M pro residues Cysl44 (by Ala) and His 163 (by Leu), the 
corresponding codons were replaced in pMal-M pro by site-directed 
mutagenesis using a recombination-PCR method (Yao et al., 1992). The 
details of the primers used for cloning and mutagenesis and the amino 
acid sequences of the recombinant proteins expressed and tested for 
proteolytic activity are given in Table I. The plasmid DNAs were 
transformed into E.coli TB1 cells and the recombinant proteins were 
synthesized, affinity purified and cleaved with factor Xa as described 
previously (Hegyi et al ., 2002). The purity and structural integrity of the 
mutant proteins were analysed by SDS-PAGE. The control protein for 
this experiment, wild-type TGEV Mpro pro , was purified in an identical 
manner. Enzymatic activities of the mutant proteins were measured by 
using a peptide cleavage assay (Ziebuhr et al., 1997) with a peptide 
substrate representing the N-terminal TGEV M pro autoprocessing site 
(H 2 N-VSVNSTLQSGLRKMA-COOH; letters in bold indicate the 
scissile bond that is cleaved by M pro ). 
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